NEXUS relies on Apache Solr to store metadata about tiles and Apache Cassandra to store the floating point array data associated with those tiles. Both Solr and Cassandra are distributed storage systems and can be run in a cluster.
Solr requires Apache Zookeeper to run in cluster mode (called SolrCloud). This notebook walks through the process of bringing up a 3 node Cassandra cluster, 3 node Zookeeper cluster, and a 3 node SolrCloud.
When initializing a Cassandra cluster, one or more nodes must be designated as a 'seed' node to help bootstrap the internal communication between nodes: Internode communications (gossip).
Therefore, the first step is to start one Cassandra container so that it can act as the seed node for the rest of our cluster.
Navigate to the directory containing the docker-compose.yml
file for the infrastructure cluster
$ cd ~/nexus/esip-workshop/docker/infrastructure
Use docker-compose
to bring up the cassandra1
container.
$ docker-compose up -d cassandra1
Wait for the Cassandra node to become ready before continuing. Run the following command to follow the logs for cassandra1
.
$ docker logs -f cassandra1
Wait for the Cassandra node to start listening for clients. It should only take a minute or so. Look for this line in the logs:
Starting listening for CQL clients on /0.0.0.0:9042
Once the first Cassandra node is running, the rest of the infrastructure cluster can be brought online. The remaining 8 containers in the infrastructure can be started using the docker-compose
command again.
docker-compose
to bring up the remaining containers. Note: Make sure you are still in the same directory as Step 1 ~/nexus/esip-workshop/docker/infrastructure
.$ docker-compose up -d
Now there should be 9 containers running that make up our 3 node Cassandra cluster, 3 node Zookeeper cluster, and 3 node SolrCloud. We can use a variety of commands to verify that our cluster is active and healthy.
List all running docker containers.
$ docker ps
The output should look simillar to this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 90d370eb3a4e nexusjpl/jupyter "tini -- start-not..." 30 hours ago Up 30 hours 0.0.0.0:8000->8888/tcp jupyter cd0f47fe303d nexusjpl/nexus-solr "docker-entrypoint..." 30 hours ago Up 30 hours 8983/tcp solr2 8c0f5c8eeb45 nexusjpl/nexus-solr "docker-entrypoint..." 30 hours ago Up 30 hours 8983/tcp solr3 27e34d14c16e nexusjpl/nexus-solr "docker-entrypoint..." 30 hours ago Up 30 hours 8983/tcp solr1 247f807cb5ec cassandra:2.2.8 "/docker-entrypoin..." 30 hours ago Up 30 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra3 09cc86a27321 zookeeper "/docker-entrypoin..." 30 hours ago Up 30 hours 2181/tcp, 2888/tcp, 3888/tcp zk1 33e9d9b1b745 zookeeper "/docker-entrypoin..." 30 hours ago Up 30 hours 2181/tcp, 2888/tcp, 3888/tcp zk3 dd29e4d09124 cassandra:2.2.8 "/docker-entrypoin..." 30 hours ago Up 30 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra2 11e57e0c972f zookeeper "/docker-entrypoin..." 30 hours ago Up 30 hours 2181/tcp, 2888/tcp, 3888/tcp zk2 2292803d942d cassandra:2.2.8 "/docker-entrypoin..." 30 hours ago Up 30 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra1
Get the Cassandra cluster status by running nodetool status
inside the cassandra1
container.
$ docker exec cassandra1 nodetool status
You should see 3 cluster nodes:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.18.0.2 4.8 GB 256 35.3% d9a0d273-b11c-41dd-9da1-cb77882f275f rack1 UN 172.18.0.5 4.42 GB 256 33.2% d68d9ea7-04a0-4eaf-b9c6-333b606bd2b1 rack1 UN 172.18.0.7 4.16 GB 256 31.5% 6f8683f9-abf8-4466-87bc-a5faa048956d rack1
Get the status of the SolrCloud by running the cell below
In [ ]:
# TODO Run this cell to get the status of the Solr Cluster. You should see a collection called
# 'nexustiles' with 3 shards spread across all 3 nodes.
import requests
import json
response = requests.get('http://solr1:8983/solr/admin/collections?action=clusterstatus&wt=json')
print(json.dumps(response.json(), indent=2))